Morphology Generation for Statistical Machine Translation using Deep Learning Techniques
نویسندگان
چکیده
Morphology in unbalanced languages remains a big challenge in the context of machine translation. In this paper, we propose to de-couple machine translation from morphology generation in order to better deal with the problem. We investigate the morphology simplification with a reasonable trade-off between expected gain and generation complexity. For the Chinese-Spanish task, optimum morphological simplification is in gender and number. For this purpose, we design a new classification architecture which, compared to other standard machine learning techniques, obtains the best results. This proposed neural-based architecture consists of several layers: an embedding, a convolutional followed by a recurrent neural network and, finally, ends with sigmoid and softmax layers. We obtain classification results over 98% accuracy in gender classification, over 93% in number classification, and an overall translation improvement of 0.7 METEOR.
منابع مشابه
Abu-MaTran at WMT 2016 Translation Task: Deep Learning, Morphological Segmentation and Tuning on Character Sequences
This paper presents the systems submitted by the Abu-MaTran project to the Englishto-Finnish language pair at the WMT 2016 news translation task. We applied morphological segmentation and deep learning in order to address (i) the data scarcity problem caused by the lack of in-domain parallel data in the constrained task and (ii) the complex morphology of Finnish. We submitted a neural machine t...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملComprehensive Analysis of Dense Point Cloud Filtering Algorithm for Eliminating Non-Ground Features
Point cloud and LiDAR Filtering is removing non-ground features from digital surface model (DSM) and reaching the bare earth and DTM extraction. Various methods have been proposed by different researchers to distinguish between ground and non- ground in points cloud and LiDAR data. Most fully automated methods have a common disadvantage, and they are only effective for a particular type of surf...
متن کاملStory Generation from Sequence of Independent Short Descriptions
Existing Natural Language Generation (nlg) systems are weak AI systems and exhibit limited capabilities when language generation tasks demand higher levels of creativity, originality and brevity. Eective solutions or, at least evaluations of modern nlg paradigms for such creative tasks have been elusive, unfortunately. is paper introduces and addresses the task of coherent story generation fr...
متن کاملThe AFRL-MITLL WMT16 News-Translation Task Systems
This paper describes the AFRL-MITLL statistical machine translation systems and the improvements that were developed during the WMT16 evaluation campaign. As part of these efforts we have adapted a variety new techniques to our previous years’ systems including Neural Machine Translation, additional out-of-vocabulary transliteration techniques, and morphology generation.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1610.02209 شماره
صفحات -
تاریخ انتشار 2016